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Abstract 

We give algorithms for the optimization problem: maxp {Q,p), where Q is a Hermitian matrix, 
and the variable p is a bipartite separable quantum state. This problem lies at the heart of sev- 
eral problems in quantum computation and information, such as the complexity of QMA(2). 
While the problem is NP-hard, our algorithms are better than brute force for several instances 
of interest. In particular, they give PSPACE upper bounds on promise problems admitting a 
QMA(2) protocol in which the verifier performs only logarithmic number of elementary gate 
on both proofs, as well as the promise problem of deciding if a bipartite local Hamiltonian has 
large or small ground energy. For Q > 0, our algorithm runs in time exponential in ||Q||f. 
While the existence of such an algorithm was first proved recently by Brandao, Christandl and 
Yard [Proceedings of the 43rd annual ACM Symposium on Theory of Computation , 343-352, 2011], 
our algorithm is conceptually simpler. 

1 Introduction 

Entanglement is an essential ingredient in many ingenious applications of quantum information 
processing. Understanding and exploiting entanglement remains a central theme in quantum in- 
formation processing research IIHHH+09L Denote by SepD (^i (g) Ai) the set of separable (i.e, un- 
entangled) density operators over the space Ai ® Ai- A fundamental question known as the weak 
membership problem for separability is to decide, given a classical description of a quantum state p 
over A\ (8) A2, whether this state p is inside or e far away in trace distance from SepD [Ai ^2)- 
Unfortunately this very basic problem turns out to be intractable. In 2003, Gurvits IIGur03ll proved 
the NP-hardness of the problem when e is inverse exponential in the dimension of Ai ® Ai- The 
dependence on e was later improved to inverse polynomial lHoaOTllGhaToll . 

In this paper we study a closely related problem to the weak membership problem discussed 
above. More precisely, we consider the linear optimization problem over separable states. 

Problem 1. Given a Hermitian matrix Q over Ai (S) A2 (of dimension d x d), compute the optimum 
value, denoted by OptSep(Q), of the optimization problem 

max (Q, X) subject to X e SepD {Ai ^ A2) ■ 

It is a standard fact in convex optimization IIGLS931 |Ioa07|| that the weak membership problem 
and the weak linear optimization, a special case of Problem [H over certain convex set, such as 
SepD (^1 A2), are equivalent up to polynomial loss in precision and polynomial-time overhead. 
Thus the hardness result on the weak membership problem for separability passes directly to 
Problem [H 
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Besides the connection with the weak membership problem for separability, Problem[T]can also 
be understood from many other aspects. Firstly, as the objective function is the inner-product of a 
Hermitian matrix and a quantum state, which represents the average value of some physical ob- 
servable, the optimal value of Problem [l] inherently possesses certain physical meaning. Secondly, 
in the study of the tensor product space IIDF92L the value OptSep(Q) is precisely the injective norm 
of Q in £(^i) C{A2), where C{A) denote the Banach space of operators on A with the operator 
norm. Finally, one may be equally motivated from the study in operations research. The definition 
of Problem [l] appeared in an equivalent form in | LQNY09 1 with the new name of "Bi-Quadratic 



Optimization over Unit Spheres". Subsequent works MHLZlOllSollll demonstrate that Problem [T] 
is just a special case of a more general class of optimization problems called homogenous poly- 
nomial optimization with quadratic constraints, which is currently an active research topic in that 
field. 

Another motivation to study Problem[l]is the recent interest about the complexity class QMA(2). 
Originally the class QMA (also known as quantum proofs) was defined IIKSV02II as the quantum 
counterpart of the classical complexity class NP. While the extension of NP to allow multiple 
provers trivially reduces to NP itself, the power of QMA(2), the extension for QMA with mul- 
tiple unentangled provers, remains far from being well understood. The study of the multiple- 
pro ver model was initiated in IIKMYOl i |KMY03]| , where QMA(k) denotes the complexity class for 
the fc-prover case. Much attention was attracted to this model because of the discovery that NP 
admits logarithmic-size unentangled quantum proofs IIBT09L This result was surprising because 
single pro ver quantum logarithm-size proofs only characterize BQP IIMW05L It seems adding 
one unentangled prover increases the power of the model substantially. There are several subse- 
quent works on refining the initial protocol either with improved completeness and soundness 
bounds IIBeilOl lABD+091 ICFlTl rGNNllll or with less powerful verifiers HCDIOL Recently it was 
proved that QMA(2)=QMA(poly) IIHMIOII by using the so-called product test protocol that deter- 
mines whether a multipartite state is a product state when two copies of it are given. There is 
another line of research on the power of unentangled quantum proofs with restricted verifiers. 
Two complexity classes BellQMA and LOCCQMA, referring to the restricted verifiers that per- 
form only nonadaptive or adaptive local measurements respectively, were defined in IIABD+09II 
and studied in IIBraOSllBCYTlH . It has been shown HBCYIIH that LOCCQMA(m) is equal to QMA 
for constant m. 

Despite much effort, no nontrivial upper bound of QMA(2) is known. The best known up- 
per bound QMA(2)CNEXP follows trivially by nondeterministically guessing the two proofs. It 
would be surprising if QMA(2) = NEXP. Thus it is reasonable to seek a better upper bound like 
EXP or even PSPACE. It is not hard to see that simulating QMA(2) amounts to distinguishing be- 
tween two promises of OptSep(Q), although one has the freedom to choose the appropriate Q. 
Note that Problem [T] was also studied in ||BCY11II for the same purpose. 

Hardness result. There are several approaches to prove the hardness of Problem [H The first is 
to make use of the NP-hardness of the weak membership problem and the folk theorem in con- 
vex optimization as mentioned above. However, one may directly reduce the CLIQUE problem 
to Problem [T] MdeKOSl |LQNY09| . There is also a stronger hardness result IIHMIOII on the exact 



running time of algorithms solving Problem [T] conditioned on the Exponential Time Hypothesis 
(ETH) IIIPOIL The hardness results extend naturally to the approximation version of Problem [H It 
is known that OptSep(Q) remains to be NP-hard to compute even if inverse polynomial additive 
error is allowed. Nevertheless, it is wide open whether the hardness result remains if one allows 
even larger additive error. 
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From the perspective of operations research, the hardness of Problem 1 is a consequence of 
not being a convex optimization problem. In this case although efficient methods, compared with 
brute-force, for finding a local optimum usually exist, on the other hand finding the global one 
is fraught with difficulty. This is because one needs to enumerate all possible local optima before 
one can determine the global optimum in the worst case. 

Our contributions. In this paper we provide efficient algorithms for Problem [T] in either time or 
space for several Qs of interest. As the hardness result implies that enumeration is likely to be 
inevitable in the worst case, our idea is to enumerate via epsilon-nets more "cleverly" with the 
help of certain structure of Q. 

When the total number of points to enumerate is not large, one can represent and hence enu- 
merate each point in polynomial space. If the additional computation for each point can also 
be done in polynomial space, one immediately gets a polynomial-space implementation for the 
whole algorithm by composing those two components naturally. We make use of the relation 
NC(poly)=PSPACE IIBor77ll to obtain space-efficient implementation for the additional compu- 
tation, which in our cases basically includes the following two parts. The first part helps to 
make sure the enumeration procedure works correctly. This is because these epsilon-nets of in- 
terest in our algorithm are not standard, additional effort is necessary to generate them. This 
part turns into a simple application of the so-called multiplicative matrix iveight update (MMW) 
method IIAHK051 IWK061 |Kal07|| to computing a min-max form, which is known to admit effi- 
cient parallel algorithms under certain conditions. The second part contains the real computation 
which ,in our case, only consists of fundamental matrix operations. It is well known those opera- 
tions usually admit efficient parallel algorithms |Gat93il- As a result, both parts of the additional 
computation admit efficient parallel algorithms, and therefore, the additional computation can be 
implemented in polynomial space in our case. 

We summarize below the main results obtained by applying the above ideas. 

1. The first property exploited is the so-called decomposability of Q which refers to whether Q can 
be decomposed in the form Q — J^fti Q] with small M. Note this concept is closely related 
to a more commonly studied concept, tensor rank. Intuitively, if one substitutes this Q's decom- 
position into {Q,pi<S)p2) and treat (Q|,|Oi) , ■ • • , {Q]^,pi) r{Ql,pi) r ■■ , (Qm^Pi) as variables, 
the optimization problem becomes quadratic and M corresponds to the number of second-order 
terms in the objective function. If we plug the values of {Q\,pi) , ■ ■ ■ , (Qm/Pi) into the objective 
function, then the optimization problem reduces to be a semidefinite program, and thus can be 
efficiently solved. Hence by enumerating all possible values of (Q|,pi) , • • • , {Q\/[,pi) one can 
efficiently solve the original problem when M is small. Since this approach naturally extends to 
the /c-partite case for k > 2, we obtain the following general result. 

Theorem 1 (Informal. See Section|3ll. Given any Hermitian Q and its decomposition Q = J^fti Q] ^ 
• • • (8) fls input, the quantity OptSep{Q) can he approximated with additive error 5 in quasi-polynomial 
time^ in d and 1/5 ifkM is bounded by poly-logarithms of d. 

By exploiting the space-efficient algorithm design strategy above, this algorithm can also be 
made space-efficient. To facilitate the later applications to complexity classes, we choose the input 
size to be some n such that d = exp(poly(n)). 

Corollary 1 (Informal. See Section H]). If kM/5 E 0(poly(n)), the quantity OptSep(Q) can be 
approximated with additive error 3 in PSPACE. 

^Quasi-polynomial time is upper bounded by 2'-'(('°S"' ' for some fixed c, where n is the input size. 
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As a direct application, we prove the following variant of QMA(2) belongs to PSPACE where 
QMA(2)[poly(n), 0(log(n))] refers to the model where the verifier only performs 0(log(n)) ele- 
mentary gates that act on both proofs at the same time and a polynomial number of other elemen- 
tary gates. Note QMA(2)[poly(n),poly(n)]=QMA(2) in our notation. 

Corollary 2. QMA(2)[poly(n),0(log(n))] C PSPACE. 

This result establishes the first PSPACE upper bound for a variant of QMA(2) where the ver- 
ifier is allowed to generate some quantum entanglement between two proofs. In contrast, pre- 
vious results are all about variants with nonadaptive or adaptive local measurements, such as 
BenQMA(2) IIABD+09llBra08l ICDlOll or LOCCQMA(2) IIABD+091IBCYTT]| . 

We also study Problem [T] when Q is a local Hamiltonian over k parties. Recall that a promise 
version of this problem in the one party case, namely the local-Hamiltonian problem, is QMA- 
complete problem IIKSV02L Our definition extends the original local Hamiltonian problem to 
its fc-partite version. However, as will be clear in the main section, the fc-partite local Hamiltonian 
problem is no longer necessarily QMA(k)-complete. On the other side, our enumeration algorithm 
based on the decomposability of Q works extremely well in this case. As a result, we obtain the 
following corollary. 

Corollary 3 (Informal. See Section [S]). Given some local Hamiltonian Q over k parties as input, 
OptSep(Q) can be approximated with additive error 3 in quasi-polynomial time in d,l/3} the 
fc-partite local Hamiltonian problem belongs to PSPACE. 

Very recently, an independent result IICSllll of us shows that the 2-partite local Hamiltonian 
problem defined above lies in QMA, and henceforth in PSPACE, which complements our algo- 
rithmic result. 

2. The second structure made use of is the eigenspace of Q of large eigenvalues. As a result, we 
establish an algorithm solving Problem [T] with running time exponential in || Q Hp- 
Theorem 2 (Informal. See Section [6]|. For positive semidefinite Q, the quantity OptSep{Q) can he 
approximated with additive error 5 in time exp(0(log((i) + 5^'^\\ Q|l| ln(|| Q||f /^))). 

A similar running time exp(0(log^((i)<J^'^|| Q ||p)) was obtained in MBCYllll using some known 
results in quantum information theory.(i.e., the semidefinite programming for finding symmetric 
extension IIDPS04II and an improved quantum de Finetti-type bound.) In contrast, our algorithm 
only uses fundamental operations of matrices and epsilon-nets. To approximate with precision 
5, it suffices to consider the eigenspace of Q of eigenvalues greater than 5 whose dimension is 
bounded by || Q||p/i5^. Nevertheless, naively enumerating density operators over that subspace 
does not work since one cannot detect the separability of those density operators. We circumvent 
this difficulty by making nontrivial use of the Schmidt decomposition of bipartite pure states. 

We note, however, that other results in IIBCY11|I do not follow from our algorithm, and our 
method cannot be seen as a replacement of the kernel technique therein. Furthermore, our method 
does not extend to the /c-partite case, as there is no Schmidt decomposition in that case. 

Open problems. The main open problem is whether Problem [T] admits an efficient algorithm in 
either time or space, when larger additive error is allowed. It is also interesting to see whether, for 
those Qs that come from the simulation of the complexity class QMA(2), the quantity OptSep(Q) 
can be efficiently computed. 
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Organizations: The rest part of this paper is organized as follows. The necessary background 
knowledge on the particular epsilon-nets in use is introduced in Section |2l The main algorithm 
based on the decomposability of Q is illustrated in Section |3l Two applications of such an algo- 
rithm is discussed immediately after; the simulation of variants of QMA(2) is discussed in Sec- 
tion m and the local Hamiltonian case is discussed in Section |5l Finally, the demonstration of an 
algorithm with running time exponential in || Q ||f for Problem [T] can be found in Section HI 

Notations: We assume familiarity with standard concepts from quantum information IINCOOl 
IKSV021 [WatOSII . Particularly, our notations follow from MWatOSI . Precisely, we use A,B to denote 
complex Euclidean spaces and L [A) ,Herm [A) ,D [A) to stand for the linear operators, Hermi- 
tian operators and density operators over A respectively. We denote the trace norm of operator Q 
by II Qlltr/ i-e. II Qlltr = Tr (Q*Q)^/^ where Q* stands for the conjugate transpose of Q. The Frobe- 
nius norm is denoted by || Q ||f and the operator norm is denoted by || Q ||op- The ii norm of vector 
X G C" is denoted by ||x||i = Y^\Li |x, | and its £00 norm is denoted by ||x||oo = max,-^i ... „ |x,|. We 
use II X II to denote the Euclidean norm. The unit ball of C" under certain norm || ■ || is denoted by 

B(c'MMl). 

2 Epsilon Net 

The epsilon-net (or e-net) is an important concept in several mathematical topics. For our purpose, 
we adopt the following definition of e-net. 

Definition 1 (e-net). Let (X, d) be any metric space and let e > 0. A subset JVe is called an e-net 
of X if for each x G X, there exists y G A4 with d{x,y) < e. 

Now we turn to the particular e-net considered in this paper. Let 7i be any Hilbert space 
of dimension d and Q = Q{M, iv) = (Qi, Q2, • ■ ■ Qm) be a sequence of operators on V. with 
II Qi Hop < ^/ for all i. Define the Q-space, denoted by SP(Q), as 

SP(Q) = {((Qi,p),(Q2,p),--- ,(Qm,p)) :peD(?^)} cc^. 

The set is convex and compact, and a (possibly proper) subset of Ra w-(M, w) = {{qi,q2: - ■ ■ ,c]m) '■ 
yi,c]i G C, II (J, II < w}. 

In the following, we construct an e-net of the metric space (SP(Q), ii). Our method will first 
generate an e-net of (Raw-(M, w), £1) via a standard procedure and then select those points that 
are also close to Q-space. We will present and analyze the efficiency of the selection process first 
and come back to the construction of the e-net afterwards. 

Selection process 

The selection process determines if some point p in Raw-{M,zv) is close to SP(Q). Denote by 
dis(p) the distance of p G C*^ to SP(Q), i.e., 

dis(p) = min ||p — a||i. 

We show in this section how to compute dis(p) efficiently in space. That the problem admits a 
polynomial time algorithm follows from the fact that it can be cast as a semidefinite programming 

^We will abuse the notation later where the metric d is replaced by the norm from which the metric is induced. 
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problem. However, to the authors' knowledge, only a few restricted classes of SDPs also admit 
space-efficient algorithms and none of them applies to our case. Thus we need to develop our 
own space-efficient algorithm for this problem. 

By making use of the definition of SP(Q) and the duality of the £i norm, one can find the 
following equivalent definition of the distance. 

dis(p) = min max Re (p — fl(p),z) , 

pGD(?^)zGB(C«|MU) 

where 

q{p) = ((Qi,|0),(Q2,p),--- ,(Qm,p)) eC^. (1) 

By rephrasing dis(p) in the above form, one shows the quantity dis(p) is actually an equilibrium 
value. This follows from the well-known extensions of von' Neumann's Min-Max Theorem ||vN28l 
IFan53L One can easily verify that the density operator set D (T-L) and the unit ball of under 
£00 norm are convex and compact sets. Moreover, the objective function is a bilinear form over the 
two sets. The Min-Max theorem implies 

min max Re (p — q(p),z) = max min Re (p — q(p),z) . (2) 
peD(?^)zeB(CMiHU) zGB(C« |MU)peD(?^) 

Fortunately, there is an efficient algorithm in either time or space (in terms of d, M, w,\/e) to 
approximate dis(p) with additive error e. The main tool used here is the so-called matrix mul- 
tiplicative weight update method IIAHK051 iKalOTl IWK06II . Similar min-max forms also appeared 
before in a series of work on quantum complexity l|TW09| IWulOal IWulObl fGWlOII . The algorithm 
presented here is another simple application of this powerful method. For the sake of complete- 
ness, we provide the proof of the following lemma in Appendix lAl 

Lemma 3. Given any point p G Raw-(M,iv) and e > 0, there is an algorithm (depicted in Appendix IAD 
that approximates dis{p) with additive error e. Namely, the return value d of this algorithm satisfies 

d — £ < dis{p) < d + e. 

Moreover, the algorithm runs in poly{d, M, w,l/e) time. Furthermore, if d is considered as the input size 
and M, w,l/e G O {poly-log (d)), this algorithm is also efficient in parallel, namely, it is inside NC. 



Construction of the e-net 

We are now ready to show the construction of the e-net of SP(Q). As mentioned before, this 
construction contains two steps below. Given any Q(M, w) and e > 0, 

• Construct the e-net of the set Raw-(M, w) with the metric induced from the £1 norm. Denote 
such an e-net by Tig. 

• For each point p G TZg, determine dis(p) and select it to Ne if dis(p) < e. We claim Ne is the 
e-net of (SP(Q),^i). 

The construction for the first step is rather routine. Creating an e'-net over a bounded complex 
region {z G C : ||z|| < zi;} is simple: we can place a 2D grid over the complex plane to cover 
the disk ||z|| < w. Simple argument shows |T^| G 0{'^). Then TZe can be obtained by the cross- 
product X ■ • • X T^. To ensure the closeness in the £i norm, we will choose e' = e/ M. 

M times 
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Theorem 4. TheAfg constructed above is indeed an e-netof{SF{Q),ii) with cardinality at most 0{{ ^"^^^ )^). 
Furthermore, for any point n G Me, we have dis{n) < e. 

Proof. First we show TZe is indeed an e-net of (Raw-(M, w), To that end, consider any point 
p G Ra w-(M, iv). From the construction of TZe, there is some point q ^ TZe such that || p — ^|| oo < e'- 
Then we have — ^||i < M||p — ^||co < Me' < e. Since A/'e C T^.^, one has \N'e\ < \TZe\ G 

In order to show Me is the required e-net, consider any point p G SP(Q). Since SP(Q) C 
Raw-(M, iv), there exists a point p' G TZe such that || p — p' |li < £■ Hence we have dis(p') < e and 
the point p' will be selected, namely p' G Me- Finally, it is a simple consequence of the selection 
process that every point n G Me has dis(n) <e. □ 

Remarks. If one choose Q to be Q{d^, 1) = {|f)(y| : /,/ = 1, • ■ ■ ,d}, one can generate the e-net of 
the density operator set with the li norm in the method described above. It is akin to generating 
an e-net for every entry of the density operator. At the other extreme, one can also efficiently 
generate the e-net of a small size SP(Q) even when the space dimension d is relatively large. 



3 The Main Algorithm 

In this section, we prove the main theorem. Without loss of generality, we assume Ai, A2 are iden- 
tical, and of dimension d in Problem [TJ Moreover, our algorithm will deal with the set of product 
states rather than separable states. Namely, we consider the following optimization problem. 

max: {Q,p) (3) 
subject to: p = pi ^ P2, Pi G D (^1) ,,02 G D (^2) • 

It is easy to see these two optimization problems are equivalent since product states are extreme 
points of the set of separable states. Our algorithm works for both maximization and minimization 
of the objective function. In fact, both results can be obtained at the same time. Since our algorithm 
naturally extends to multipartite cases, we will demonstrate the algorithm for the A:-partite version 
first, and then obtain the solution for Problem[l]as a special case when k = 2. 

Problem 2 (k-partite version). Given any Hermitian matrix Q over Ai(^ ■ ■ ■ (S> Ak {k > 2), compute 
the optimum value OptSep(Q) of the following optimization problem to precision S. 

max: {Q,p) (4) 
subject to: p ^ pi® ■ ■ ■ ® Pk, Vz, pi G D {Ai) . 

Before describing the algorithm, we need some terminology about the decomposability of a multi- 
partite operator. Any Hermitian operator Q over Ai (^i A2 <Si ■ ■ ■ (^i Ak is called M-decomposable if 
there exists {Q\, Q2, • • ■ , Q^) ^ L ( A)^ for t=l,2,..., k such that 

M 

i=i 

To facilitate the use of e-net, we adopt a slight variation of the decomposability above. Let 
w G K'^ denote the widths of operators over each Ai. Any Q is called {M,w) -decomposable if 
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1. Let Qt{M,iVt) = {QlQir- ■ ,Q*m) for t=l,..., k-1. Let W = W^^^Wi. Generate the e^-net (by 
TheoremlDl of (SP(Qt)/^i) for each t=l,..., k-1 with et = WtS/{k — 1)W and denote such a set 
by A/^j . Also let OPT store the optimum value of the maximization problem. 

2. For each point q = {f, f,--- f'^) G J^}^ xAf^x ■■■ x N^-\, let Qj' be 

!=1 

and calculate = \{Q^ + Q''*). Then compute the maximum eigenvalue of Q^, denoted by 
Amax(^)- Update OPT as follows: OPT = maxjOPT, AInax(^f)}• 
3. Return OPT. 



Figure 1: The main algorithm with precision S. 



Q is M-decomposable and the widths of those operators in the decomposition are bounded in 
the sense that max, || Q[ ||op < Wt for t=l,2,..., k. It is noteworthy to mention that the decompos- 
ability defined above is related to the concept tensor rank ^ defined in tensor product spaces. 
Precisely for any hermitian operator Q over Ai (X) A2 ^ ■ ■ ■ ^ A^, its tensor rank rank(^[Q) is de- 
fined to be min { M | Q is M-decomposable } . Its bounded tensor rank hrank(^ ( Q/ ^) is defined to be 
min{M|Qis (M, a; ) -decomposable}. 

By definition, we have rank0(Q) (resp. brank0(Q, w)) is the minimum M that Q can be M 
(resp. (M, zi;))-decomposable. However, given the representation Q as input, it is hard in general 
to compute rank(x)(Q), brank(x)(Q, w), or its corresponding decomposition. Therefore it is hard 
to make use of the optimal decomposition when Q is the only input. Instead, for any (M, w)- 
decomposable Q we assume its corresponding decomposition is also a part of the input to our 
algorithm. 

Theorem 5. Let Q he some {M, w) -decomposable Hermitian over Ai® Ai® ■ ■ ■ ® A}^ (each Ai is of 
dimension d) and ^ > 0. Also let {Q[, Qj, ■ ■ • , Qm)/ ^ = 1/2, ■■ ■ ,kbe the operators in the correspond- 
ing decomposition of Q. The algorithm shown in Fig. ^approximates the optimum value OptSep{Q) of 

Problem with additive error 5. Furthermore, the whole algorithm runs in 0(( ^^^'^"'^-2^ ^ x 
poly {d,M,k,W,l/S) time. 

Proof. Let'sfirstprovethecorrectnessof the algorithm. By choosing Qf(M,zft) = {Q\,Q2,-- - /Qm) 
for t=l,...,k-l, the algorithm first generates the e-net A/"e, of each (SP(Qf),£i) , whose correctness 
is guaranteed by Theorem HI By substituting the identity Q = Yifti Q] Qf ^ ■ ■ ■ Qf^/ the 
optimization problem becomes 

max: (jLv) Vr ■ ■ v\'^ QI Pl^ 

subjectto: Vf G {1,- ■ ■ ,k-l},ft e SP(Qf(M, Wf)), and p,, eD{Ak) . 

^Our definition should be more accurately related to the Kronecker-Product rank defined in IRvLllI , a special case 
of the more general concept tensor rank. 
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Thus, solving the optimization problem amounts to first enumeratirig pt G SF{Qt{M,zvi)) for 
each t, and then solving the optimization problem over D (Ak)- 

Consider any point p = {p^,p^, ■ ■ ■ ,p^^^) G SP(Qi) x ■ ■ • x SP(Q/;_i). Due to TheoremlH 
there is at least one point q = {q^, q^, - ■ ■ ^^^) S x M}^ x • ■ ■ x M^^J^ such that || <f — || i < 
£t for t=l,..,k-l. The choice of is to symmetrize where the latter is not guaranteed to be 
Hermitian because q only comes from an e-net. With Qj^ being Hermitian, it is clear that ^ma^iq) = 
maXp^£D(yj^) {Q^,Pk). Now let's analyze how much error will be induced in this process. 

Let P^{p)= p]pr-- v\'^Q\ and = \ {P^ + P^*). It is not hard to see that P^ = PK The 
error bound is achieved by applying a chain of triangle inequalities as follows. Firstly, one has 



iP'^-Q'llop = ||^(P'-Q') + ^(P'*-Q'*)||op < ^(||P'-Qiop + ||P'*-Q'llop) = llP'-Qlop. 



Then we substitute the expressions for P^, and apply the standard hybrid argument. 



M 



lop- 



i=l 

M k-1 

= II E E ■ ■ r ■ ■ • p^-' r ^^[pr^ • ■ ■ pr^)Qf iiop. 

which is immediately upper bounded by the sum of the following terms, 

MM M 

L\pl-^\\pf---j^-'\^Q^K'L\^M-^M---p^-'\mopr--,E\^ 

i=l /=1 ;=1 

As the term above can be upper bounded by CfW/ Wt for each t=l,...,k-l, we have, 

s s 

' . ' 

k-l terms 

Hence the optimum value for any fixed p won't differ too much from the one for its approximation 
q in the e-net. This is because 

max (P\pk)^ max /Q\p,\ + /p'<-Q\p, 
By Holder Inequalities we have \ {P'^ - Q'^,pk) \ < || P''^ - 0*^ 1 1 op 1 1 Pit 1 1 tr < ^ and thus, 

Amax(^) - ^ < max (P''{p),pk) < Amax(^) + ^• 

We now optimize p over SP(Qi) x ■ ■ • x SP(Q)c-i) and the corresponding q will run over the e-net 
J\f}^ X J\f^^ X • • • X Afsi^]. As every point q e J\f}^ x Af^^ x • ■ ■ x N'l'-'^ is also close to SP(Qi) x 
• • • X SP(Q;t-i) m the sense that dis(^) < £t for each t, we have 

max Amax(^)-^< max max (P'^{p),pA < max Ainax(^)+^- 

qeJ\f}^xJ\fi^x-xJ\ft-\ peSP(Qi)x-xSP(Q^_i)fteD(A) \ / qeJ\f}^xj\fi^x-xj\ft-\ 
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Finally, it is not hard to see that OPT = max-^^i x- ■xA^'^^i '^max(^) and therefore 

OPT -S< OptSep(Q) < OPT + 3. 

Now let us analyze the efficiency of this algorithm. The total number of points in the e-net Af}^ x 
Af^x-'-x M^-\ is upper bounded by o(( (^^^l^^)^^-!)^!) by Theoremgl For each point q, the 
generation of such a point will cost time polynomial in d, M, W,l/S (this part is done through the 
calculation of dis(^). See Lemma|3l ). After the generation process, one needs to calculate Q'^ and 
its maximum eigenvalue for each point, which can be done in time polynomial in d, k, M. Thus, 
the total running time is bounded by o( ( M^^) (^^-1)^^) x poly(d, M, k, W, 1/6). □ 

Remarks. There are a few remarks to make about Theorem|5l First, it is straightforward to extend 
the concept of decomposability to its approximate version. For instance, any Hermitian Q is called 
e-approximate (M, t<;)-decomposable if there exists some (M, i(;)-decomposable Q, such that \\Q — 
Oil < ^/ where the norm could be either the operator norm or the injective tensor norm. It is easy 
to verify that the same algorithm solves OptSep(Q) approximately. 

Second, all operations in the algorithm described in Fig. [T] can be implemented efficiently in 
parallel in some situation. This is because fundamental operations of matrices can be done in 
NC and the calculation of dis(p) can be done in NC (See Lemma |3]) when M, W,k,l/S are in nice 
forms of d. Thus, we can apply the observation stated in the introduction and prove the algorithm 
in Fig. [1] can also be made space-efficient. To facilitate the later use of this result, we will change 
the input size as follows. 

Corollary 4. Let n be the input size such that d = exp(poly(n)), if W/3 G 0(poly(n)),A:M G 
0(poly(n)), then OptSep(Q) can be approximated with additive error 3 in PSPACE. 

Proof. Here we present an argument that composes space-efficient algorithms directly. Given Q 
and its decomposition as input, consider the following algorithm 

1. Enumerate each point p = (pi, pi,--- , Pk-i) from the raw set TZl^ x ■ ■ ■ x Tlg^^^- 

2. Compute dis(pt) for each t=l,...,k-l. If p is a valid point in the epsilon-net, then we continue 
with the rest part in Step 2 of the algorithm in Fig.[TJ 

3. Compare the values obtained by each point p and keep the optimum one. 

Given the condition W/S G 0(poly(n)), A:M G 0(poly(n)), the first part of the algorithm can 
be done in polynomial space. This is because in this case each point in the raw set can be rep- 
resented by polynomial space and therefore enumerated in polynomial space. The second part 
is more difficult. Computing dis(pt) for each t=l,...,k-l can be done in NC(poly(n)) as shown 
in Lemma HI Step 2 in the main algorithm only contains fundamental operations of matrices 
and the spectrum decomposition. Thus, it also admits a parallel algorithm in NC(poly(n)). One 
can easily compose the two circuits and get a polynomial space implementation by the relation 
NC(poly)=PSPACE IBor 77L The third part can obviously be done in polynomial space. 

Thus, by composing these three polynomial-space implementable parts, one proves the whole 
algorithm can be done in PSPACE. □ 
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4 Simulation of several variants of QMA(2) 



This section illustrates how one can make use of the algorithm shown in Section |3] (when k=2) to 
simulate some variants of the complexity class QMA(2). The idea is to show for those variants, 
the corresponding POVM matrices of acceptance are (M, w)-decomposable with small Ms. Before 
we dive into the details, let us recall the definition of the complexity class QMA(2). 

Definition 2. A language C is in QMA(2)„,^c,s if there exists a polynomial-time generated family 
of quantum verification circuits Q = {Qn\n G N} such that for any input x of size n, the circuit 
Q„ implements a two-outcome measurement {Q^*^*^,! — Qj*^*^}. Furthermore, 

• Completeness: If x ^ C, there exist 2 witness \ipi) G A^, \ tp2) G A2, each of m qubits, such 
that 

{Qf\\lpl){tpl\®\lp2){lp2\)>C. 

• Soundness: If x ^ £, then for any states \ tpi) ^ Ai, jt/'a) G A2, 

{QTA^l){^Pl\^\^p2){^p2\)<S. 



We call QMA(2)=QMA(2)poiy(„) 2/3,1/3- It is easy to see that simulating the complexity class 
QMA(2) amounts to distinguishing between the two promises of the maximum acceptance prob- 
ability, represented by the inner product {Qx^^,p), over the set of all possible valid strategies of 
the two provers, which is exactly SepD (^1 ^2)- Note the maximum acceptance probability is 
exactly OptSep(Q^'^'^) defined in Problem [H Thus, if one were able to distinguish between the two 
promises of OptSep(Q^5'^), one could simulate this protocol with the same amount of resources 
(time or space). 

The first example is the variant with only logarithm-size proofs, namely QMA(2)Q(iog(„)) 2/3,1/3- 
It is not hard to find out the corresponding POVMs of acceptance (i.e. Qf^^) need to be (poly(n),ri;)- 
decomposable since Ai, A2 in this case are only of polynomial dimension. Moreover, id could be 
(1,1) in this case. Thus, it follows directly from Corollary H] that OptSep(Q^'^'^) can be approxi- 
mated in polynomial space. Namely, 

QMA(2)o(iog(„)),2/3,i/3 Q PSPACE. 

The next example is slightly less trivial. Before moving on, we need some terminology about 
the quantum verification circuits Q. Assume the input x is fixed from now on. Let Ai,A2 be the 
Hilbert space of size d_A for the two proofs and let V be the ancillary space of size dy. Note dj^dy 
is exponential in n. Then the quantum verification process will be carried out on the space Ai (8) 

A2(^V with some initial state \ ipi) \ tp2) 0^ where | i/'i ) , 1 1/^2 ) are provided by the provers. The 
verification process is also efficient in the sense that the whole circuit only consists of polynomial 
elementary gates. Without loss of generality, we can fix one universal gate set for the verification 
circuits in advance. Particularly, we choose the universal gate set to be single qubit gates plus 
the CNOT gates IINCOOL One can also choose other universal gate sets without any change of the 
main result. 

We categorize all elementary gates in the verification circuits into two types. A gate is of type-I 
if it only affects the qubits within the same space (i.e, ^i,^2/Or, V). Otherwise, this gate is of 
type-II. It is easy to see single qubit gates are always type-I gates. The only type-II gates are CNOT 
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gates whose control qubit and target qubit sit in different spaces. Let p, r : N ^ N be polynomial- 
bounded functions. A polynomial-time generated family of quantum verification circuits Q is 
called Q[p,r] if each Q„ only contains p(n) type-I elementary gates and r(n) type-II elementary 
gates. 

Definition 3. A language C is in QMA(2),„^c,s[P/?'] if is in QMA(2)„j^c,s with some Q[p,?'] verifi- 
cation circuit family. 

It is easy to see that QMA(2) = QMA(2) [poly, poly] from our definition. In the following 
we will show that when the number of type-II gates is relatively small, one can simulate this 
complexity model efficiently by the algorithm in Fig. [H 

Lemma 6. For any family of verification circuits Q[p, r], the corresponding POVM Qf'^ is (4''^"', (1, 1))- 
decomposable for any n G N and input x. Moreover, this decomposition can he calculated in parallel with 
0(f(n)4''(")) X poly{n) time. 

Proof. For any n G N and input x, let us denote the whole unitary that the verification circuit 
applies on the initial state hy U = UtUt-i ■ ■ ■ Ui where each Ui corresponds to one elementary 
gate and t = p + r. Without loss of generality, we assume the output bit is the first qubit in the 
space V and the verification accepts when that qubit is 1. Let V be the space V without the first 
qubit, then we have 

Qf' = Trv {1a,A2 ^ |0)(0| ^ U)1a^A, ^ |o)(0 

Let Pf+i = Ia.Ai ^ ly and Pr = U*PT:+iUr for T=t,t-l,...,l. It is easy to see Pi = 

^*{^AiA2 ®'^v ® Also it is straightforward to verify that Pf+i is 1-decomposable. Now 

let us observe how the decomposability of Pj changes with r. 

For each t, the unitary Uj either corresponds to a type-I or type-II elementary gate. In the 
former case, applying Ur won't change the decomposability. Thus, Pt is M-decomposable if Pr+i 
is. In the latter case, applying Uj will potentially change the decomposability in the following 
sense. For any such CNOT gate one has Uj = |0)(0| (8)1+ (8)X where X is the Pauli matrix 
for the flip. And one can show 

Pr = (|0)(0|^1)Pt+i(|0)(0|^1) + (|0)(0|^11)Pt+i(|1)(1|^X) 
+ (|1)(1|^X)Pt+i(|0)(0|^1) + (|1)(1|®X)Pt+i(|1)(1|^X). 

Thus in general we can only say P^ is 4M-decomposable if Pt+i is M-decomposable. As there 
are r(n) type-II gates, one immediately has Pi is 4''("^-decomposable. Moreover, each operator 
appearing in the decomposition is a multiplication of unitaries , |0) (0| , |1) (1 1 and X in some order, 
which implies the operator norm of those operators is bounded by 1. Therefore we have Pi is 
(4' ("), (1, l))-decomposable. 

Finally, it is not hard to verify that multiplications with IaiAz ^ 0) (0 and partial trace over 

V won't change the decomposability of Pi. Namely, we have Q^^'^ is (4''("), (1, l))-decomposable. 
The above proof can also be considered as the process to compute the decomposition of Qf^^. Each 
multiplication of matrices can be done in NC(poly{n)). And the total number of multiplications is 
upper bounded by 0(f(n)4''("^). Therefore, the total parallel running time is upper bounded by 
0(f(n)4''("))x poly(n). □ 
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Corollary 5. QMA(2)[poly(n), 0(log(n))] C PSPACE. 

Proof. This is a simple consequence of Lemma [7| and Corollary |4l For any fixed x of length n. One 
can first compute the decomposition of Q^^'^ in parallel with 0(f(n)4''("))x poly(n) time, which is 
parallel polynomial time in n when r(n) = 0(log(n)) and t{n) G poly(n). Hence the first step can 
be done in polynomial space via the relation NC(poly)=PSPACE IIBor 77L 

Then one can invoke the parallel algorithm in Corollary H] to approximate OptSep(Q^'^'^) to 
sufficient precision S such that one can distinguish between the two promises. Precisely in this 
case, we choose those parameters as follows, 

it = 2, W = 1,M = 4P(-^°s(n)) ^ poly(n),l/^ = poly(n). 

Thus the whole algorithm can be done in polynomial space, which completes the proof. □ 

Remarks. Although the proof of the result is not too technical, it establishes the first non-trivial 
upper bound (PSPACE in this case) for variants of QMA(2) that allow quantum operations act- 
ing on both proofs at the same time. In contrast, previous results are all about variants with 
nonadaptive or adaptive local measurements, like BellQMA(2) IIBraOSl lABD+09l ICDIOII or LOCC- 
QMA(2) IIABD+091IBCYTTH . 

However, our results are hard to extend to the most general case of QMA(2). This is because 
SWAP-test operation uses many more type-II gates than what is allowed in our method. And 
SWAP-test seems to be inevitable if one wants to fully characterize the power of QMA(2). 

5 Quasi-polynomial algorithms for local Hamiltonian cases 

In this section, we illustrate that if Q appears in the objective function that is a local Hamiltonian 
then the optimal value OptSep(Q) can be efficiently computed by our main algorithm. Consider 
any fc-partite space Ai ® ■ ■ ■ ^ Ak where each partite Ai contains n qubits and thus is of 
dimension 2". 

Definition 4. Any Hermitian Q over Ai ^ ■ ■ ■ ^ A^ is a /-local Hamiltonian if Q is expressible as 
Q = YL\=i Hi where each term is a Hermitian operator acting on at most / qubits among k parties. 

Hamiltonians are widely studied in physics since they usually characterize the energy of a 
physical system. Local Hamiltonians are of particular interest since they refer to the energy of 
many interesting models in low-dimension systems. Our algorithm can be considered as a way to 
find the minimum energy in the system achieved by separable states. 

Local Hamiltonians are also appealing to computational complexity theorists since the discov- 
ery of the promise 5-local Hamiltonian problem IIKSV02II which turns out to be QMA-complete. 
Precisely, it refers to the following promise problem when k = 1,1 = 5. 

Problem 3 (fc-partite Z-local Hamiltonian problem). Take the expression Q = YL\=i for ^riy l- 
local Hamiltonian over Ai (Si ■ ■ ■ (S) Ak s^s input^, where ||H; ||op < 1 for each i. Let OptSep(Q) 
denote the minimum value of {Q,p) achieved for some p G SepD (^i Cg) ■ ■ ■ Cg) ^jt). The goal is to 
tell between the following two promises: either OptSep(Q) > a or OptSep(Q) < b for some a > b 
with inverse polynomial gap. 

*It is noteworthy to mention that the input size of local Hamiltonian problems can be only poly-logarithm in the 
dimension of the space where Q sits in. 
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When k = 1, the promise problem defined above is exactly the original Z-local Hamilto- 
nian problem. Subsequent results demonstrate that it remains QMA-complete even when / = 
3,2 IIAGIK091 |KKR06llOT08ll . Our definition of the promise problem naturally extends to the k- 
partite case. We refer to Chapter 14 in [KSV02J for technical details. It is not hard to see that 
fc-partite Z-local Hamiltonian problems belong to QMA(k) by applying similar techniques in the 
original proof. However, they do not remain as QMA(k)-complete problems. This is because 
the original reduction transforms from the proof space to the transcript and clock space and the 
separability of quantum states does not persevere under such an operation. As a result, A:-partite 
local Hamiltonian problems defined above only enforce the separability in the transcript and clock 
space rather than in the proof space. Note this is not an issue for the 1-partite case since there is 
no separability involved. Nevertheless it becomes a huge problem for its fc-partite extensions. 

Lemma 7. Any l-local Hamiltonian Q over Ai® ■ ■ ■ ® such that Q — \\Hi\\op < w is 

(O ( (4nA:) ^),w )-decomposable. 

Proof. Since Q is a Z-local Hamiltonian, it is easy to see r < {^") . For each H, with || H, ||op < zo, since 
it acts only on at most I qubits, it must be (4', H;)-decomposable. Thus Q is (r4', zi;)-decomposable. 
In terms of only n, k, I, we have Q is (0((4nZc)'), a;)-decomposable. □ 

Corollary 6. Take the expression Q = Y^l^i Hi of any Z-local Hamiltonian over Ai® ■ ■ ■ ® Ak (each 
Aj is of dimension d = 2") such that ||H;||op < zo for each / as input. Assuming k,l = 0(1), the 
quantity OptSep(Q) can be approximated to precision 3 in quasi-polynomial time in d, w,\l 5. 

If n is considered as the input size and w 1 5 = 0(poly(n)), then OptSep(Q) can be approxi- 
mated to precision 3 in PSPACE. 

Proof. The proof of the first part follows directly from Lemma [Zl and Theorem |5l Recall the proof 
of Lemma [7| also provides a way to compute the decomposition of Q given the expression Q = 
Yli^i Hi as input. It is easy to verify that O (r4' ) time (upper bounded by O ( {4k log cZ) ' ) ) is sufficient 
to complete this computation. After that, one may directly invoke the algorithm in Fig.[T]and make 
use of Theorem |5l Now we substitute the following identities into our main algorithm. Note 
k, I = 0(1) and we have M = 0(log'^^^' d), W = iu^^'^\ One immediately gets the total running 
time bounded by 

exp(0(log°(^)(£Z)(loglogcZ + logri;/<5))) x poly{d,iv,l/S), 

which is quasi-polynomial time in d,iv,l/S. 

For the second part when n is considered as the input size, it is easy to see the computation of 
the decomposition of Q according to Lemma[7|can be done in NC(poly), henceforth in polynomial 
space. (Note M = 0(poly(n)).) Then by composing with the polynomial-space algorithm implied 
by Corollary m one proves the whole algorithm can be implemented in polynomial space. □ 

Remarks. It is a direct consequence of Corollary [6] that Problem|3]is inside PSPACE. 

6 An algorithm with running time exponential in || Q ||f 

In this section we demonstrate another application of the simple idea "enumeration" by epsilon- 
net to Problem [U As a result, we obtained an algorithm with running time exponential in || Q||f 
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1. Compute the spectral decomposition of Q. After that, one has the decomposition Q = 

Af |Yt)(Yt|. Choose e = S/2 and Te = {t : At > ej.Also let OPT store the optimum 
value of the maximization problem. 

2. Generate the e-net of the unit ball of Cl'^''! under the Euclidean norm with e = j^^^- Denote 
such set by A4. Then for each point a G A4, 

(a) Compute \(pci) = EteTe *f l^t) ^rid compute the Schmidt decomposition of \(pa), i-e. 

i 

where }ii> }ii> ■ ■ ■ and {«;}, {u,} are orthogonal bases. Note \(pa) is not necessarily a 
unit vector. 

(b) Update OPT as follows: OPT=max{OPT,/^i}. 

3. Return OPT. 



Figure 2: The algorithm runs in time exponential in || Q ||f/(J. 

(or IIQIIlocc |MWW09] ^) for computing OptSep(Q) with additive error 5. A similar running 
time exp(0(log^(rf)(5~^|| Qllp)) was obtained in MBCYllll using some known results in quantum 
information theory.(i.e., the semidefinite programming for finding symmetric extension [DPS04i | 
and an improved quantum de Finetti-type bound.) 

By contrast, our algorithm makes no use of any advanced tool above and only utilizes funda- 
mental operations of matrices. Intuitively, in order to approximate the optimum value to precision 
S, one only needs to look at the eigenspace of eigenvalues greater than S, the dimension of which is 
no more than || Q ||p/(5^. Nevertheless, naively enumerating density operators over that subspace 
doesn't work since one cannot detect the separability of those density operators. We circumvent 
this difficulty by making nontrivial use of the Schmidt decomposition of bipartite pure states. 

Finally, as mentioned in the introduction we admit that other results in the original paper |BCY1H 
do not follow from our algorithm and our method cannot be seen as a replacement of the kernel 
technique of that paper. Also our method does not extend to the A:-partite version as there is no 
Schmidt decomposition in that case. 

Recall the optimization problem we are interested in is equivalent to the following one. 

max : {Q,p) s.t. p = \u){u\ ^ \v){v\ , \u) G Ai, \v) G Ai- 

Theorem 8. Given any positive semidefinite Q over A\ ® A2 (of dimension d x d) and S > 0, the al- 
gorithm in Fig. ^approximates the optimal value OptSep{Q) with additive error 5 with running time 
exp(0(log(tf) + 3-^ Q \\l ln(|| Q ||f /^))). 

Proofi We first prove the correctness of the algorithm. The analysis will mainly be divided into two 
parts. Let Se = span{ |Yf ) \ t G T^}. The first part shows it suffices to only consider vectors inside 

^This follows easily from the fact || Q||f = 0(|| Q||locc) IMWW09I where || Q||locc star\ds for the LOCC norm of 
the operator Q. 
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the subspace Sg for approximating OptSep(Q) with additive error S. The second one demonstrates 
that our algorithm in Fig. |2] approximates the optimal value obtained by only considering vectors 
in Se- Precisely, since forms a basis, one has \u) \ v) = YLtGTe i^f l^t) + I^f^r,- l^t \^t) where fi 



is a imit vector in C^^. Then we have 



{Q,\u){u\(^\v){v\)= Y.Atm^+E^t\f^ 



(I) m 

where the term (II) is obviously bounded by S/2 (i.e., Ef^r^ '^tli^tP < For the term (I), it is 

equivalent to OptSep(Q) where Q = J^t^Ye \^t){^t\- Namely, small eigenvalues are truncated 
in Q. Now observe the following identity. 

max {Q,\ii){u\^\v){v\) = max V {u\ {v\Wt) \^ = maxW^'^'f 

\u)\v) l">k>fer, 



= max max I a? A/Af (m| (wlYf) P = max max \ {u\ (vlcha) 

|")k>aeB(ClrH,||.||) \u)\v)aGB{C\^^\,\\-\\) 

= max max I (u\ (ulfi'a) 

<.eB(CM,||.l|)|u)|t;) 

where 7"'" G Cl^-I and j^'" = 

{u \ {v\Yt) for each f S Fg. The second line comes from the 
duality of the Euclidean norm (i.e., ||y|| = max||2||<i | (z|y) |). The third line comes by exchanging 
positions of the two maximizations. We then make use of the following well-known fact. 

Fact ( IINCOOl ). For any bipartite vector \xp) with the Schmidt decomposition 

where }ii > }ii> ■ ■ ■ and {«,}, {y,} are orthogonal bases . Then max|„^|j,^ | {u\ {v\ip) \ = }ii and 
the maximum value is obtained by choosing \u) \v) to be \ui) \vi). 

It is not hard to see that our algorithm computes exactly the term on the third line except that 
we replace the unit ball by its e-net. However, this won't incur too much extra error. For any 
OL G B(C'^''I, II • II ), there exists a. E Me, such that || a — a || < e. Thus, the extra error incurred is 
II (m I {v\(pa) p — I (m| {'o\(pci) Pi and can be bounded by 

{\\\^oc)\\ + \\\(P&)\\)\{u\{v\^cc-^r)\ < ImaxUfiJ max ||(/>^J| 



= 2^||Q||f x£^||Q||f <<5/2, 

where max||^||<(,/||^^ || < e'\/|| Q||f for any e' > can be verified directly and therefore the total 
additive error is bounded by(5/2 + (5/2 = S. 

Finally, let us turn to the analysis of the efficiency of this algorithm. The spectrum decom- 
position in the first step takes polynomial time in d, so is the same with calculation of \'ipu)- The 
generation of the e-net of the unit ball is standard and can be done in 0((1 + f)'^*^') x poly(|F(r|). 
The last operation, finding the Schmidt decomposition, is equivalent to singular value decompo- 
sitions, and thus can be done in polynomial time in d as well. Also note IF^ | < minjcl^, || Q ||p/(5^}. 
To sum up, the total running time of the algorithm is upper bounded by 0((1 + f )'^'^) x poly((i), 
or equivalently exp(0(log(d) + ^-2|| Q||2 ln(|| Q||f/^))). 

□ 
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Remarks. One can also apply the observation in the introduction to parallelize the computation in 
this case. However, the size of the e-net here will depend on some parameter (i.e. || Q ||f/^) other 
than the input. 
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]nd 



. Also let WW =t^,d = dim {X). 



1- Let7= sMiiJandT 

2. Repeat for each t = 1, . . . ,T: 

(a) Let |o(^) = W(*VTrW(^) and compute q{p'''^) (Equ. (d])). One can then rewrite the 
vector p — if(|o''') in the polar form (c('^e"''i \c2^e^'f2 \ - ■ ■ ,c^^e"t'M) and choose z^') = 
{e^"t'i \e^"t'2 \ - ■ ■ ,e^"^M ). It is not hard to see such z'*' maximizes Re — q{p^*^),z 

(b) Choose N^^' to be 

N(*) = Re (^pM'^) - ^(Q(') + Q^'>)+2Mzolx, 

where Q^*) = E^i e+'^-'^Q,-. 

(c) Update the weight matrix as follows: W(*+^) = exp(— 7 I^^^i N^^-*). 

3. Return J = i E/Li (p^^l N^'^ - IMwtx 



Figure 3; An algorithm that approximates the with additive error e. 

A Proof of Lemma |3] 

Theorem 9 (Multiplicative weights update method — see Ref . ||Kal07l Theorem 10]). Fix 7 G (0, 1 /2) . 

Let N(^', . . . , N(^) he arbitranj dxd "loss" matrices with ^ N^*' ^ aJ. Lef W^^\ W(^) be d x d 
"weight" matrices given by 

W(i) = I W(*+^) = exp(-7(N(i) + ■ ■ ■ + N('))). 

Letp^^\ . . . ,|o(^) be density operators obtained by normalizing each W^^\ . . . , W(^) so thatp^*^ = W^'V Tr 
For flZ/ density operators p it holds that 




Note that Theorem |9] holds for all choices of loss matrices N^-^\ . . . , N^'^\ including those for 
which each N^^^ is chosen adversarially based upon W^^\ . . . , W^^\ This adaptive selection of loss 
matrices is typical in implementations of the MMW. Consider the algorithm shown in Fig.|3l 

Lemma 10 (Restated Lemma |3]|. Given any point p G Raw-(M,w) and e > d, the algorithm in Fig.^ 
approximates dis{p) with additive error e. Namely, the return value d of this algorithm satisfies 

d — £ < dis{p) < d + e. 

Moreover, the algorithm runs in poly{d, M, iv,l/e) time. Furthermore, if d is considered as the input size 
and M, w,l/e G O {poly-log (d)), this algorithm is also efficient in parallel, namely, inside NC. 
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Proof. The algorithm is a typical application of the matrix multiplicative weight update method. 
In order to make use of Theorem |9l we need first to show N^^^ is bounded for each t. Since 
p E Ra w-(M, w) and || z^*' ||oo < 1/ by Cauchy-Schwartz inequality we have 

|Re^p,zW^ I < ||p||i||zW||co < M||p||oo||zW Ileo = M?i;. 

Furthermore we have 

IIQIiop = liE^"^' QHlop < Ell Hop < Mw. 

i=l i=l 

Thus by triangle inequality, one can easily find 

Then we can make use of Theorem |9l Immediately, for any p G D (A"), we have 

Substitute a = AMw, 7 = ^ and T = . Also consider the identity (^p^*\ N^*' - IMwl^'^ = 
Re — ^(p(')),z(')y Then we have for any p G D (A"), 

^~ = Y £ {P^'^' ^^'^ " 2M«;1'^') < Re - ^(P)' Y E ^^'^ ^ + ^- (5) 

Consider the equilibrium value form of dis(p) in Equ. (|2]). For each p^^\ we always find the z^^) 
that maximizes Re (^P ~ ^{p^^^)'^- Hence, dis(p) < d. Let p* be any equilibrium point of the 
equilibrium value in Equ. By substituting such p* into Equ. (O we have 

d<Re/^p- q{p*), ^ E ^ + ^ ^ dis(p) + e. 

So far we complete the proof of the correctness of this algorithm. Note that each step in the 
algorithm only contains fundamental operations of matrices and vectors, which can be done in 
polynomial time in M, d. Also there are totally 0(T) = poly (In d, M, iv,l/e) steps, thus the whole 
algorithm can be executed in poly(d, M, zv,l/e) time. Moreover, given the fact that fundamental 
operations of matrices and vectors also admit efficient algorithms in parallel (i.e., NC algorithm), 
one can easily compose these NC circuits of each step and obtain a NC algorithm as a whole if 
the total number of steps T is not too large. Precisely, if M,w,l/e G 0(poly-log((i)) and d is 
considered as the input size, this algorithm is also efficient in parallel. □ 
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